A semantic tagger for the Finnish language
نویسندگان
چکیده
This paper reports on the current status and evaluation of a Finnish semantic tagger (hereafter FST), which was developed in the EU-funded Benedict Project. In this project, we have ported the Lancaster English semantic tagger (USAS) to the Finnish language. We have re-used the existing software architecture of USAS, and applied the same semantic field taxonomy developed for English to Finnish. The Finnish lexical resources have been compiled using various corpus-based techniques, and the resulting lexicons have then been manually tagged and used for the FST prototype. At present, the lexicons contain 33,627 single lexical items and 8,912 multi-word expression templates. In the evaluation, we used two sets of test data. The first test data is from the domain of Finnish cooking, which is both sufficiently compact and sufficiently versatile. The second data is from Helsingin Sanomat, the biggest Finnish daily newspaper. As a result, the FST produced a lexical coverage of 94.1% and a precision of 83.03% on the cooking test data and a lexical coverage of 90.7% on the newspaper data. While there is much room for improvement, this is an encouraging result for a prototype tool. The FST will be continually improved by expanding the semantic lexical resources and improving the disambiguation algorithms.
منابع مشابه
Porting an English semantic tagger to the Finnish language
Semantic annotation is an important and challenging issue in corpus linguistics and language engineering. While such a tool is available for English in Lancaster (Wilson and Rayson 1993), few such tools have been reported for other languages. In a joint Benedict project funded by the European Community under the ‘Information Society Technologies Programme’, we have been working towards developi...
متن کاملEnglish-Russian-Finnish Cross-Language Comparison of Phrasal Verb Translation Equivalents
A phraseological expression in a language may have equivalent expressions in other languages with different morpho-syntactic structures and semantic properties. Our recent experience in the Benedict Project (EU IST-2001-34237), in which a Finnish semantic lexicon compatible to the Lancaster English semantic lexicon (Rayson et al., 2004) has been built, shows that there can exist complex cross-l...
متن کاملUsing a Semantic Tagger as a Dictionary Search Tool
The USAS semantic tagger is a powerful language technology tool that has proven to be very effective in various applications such as content analysis, discourse analysis and information extraction. In the Benedict project, we intend to use the semantic taggers for the English and Finnish languages as search tools in electronic dictionaries, thereby enabling users to carry out context-sensitive ...
متن کاملTagging Named Entities in 19th Century and Modern Finnish Newspaper Material with a Finnish Semantic Tagger
Named Entity Recognition (NER), search, classification and tagging of names and name like informational elements in texts, has become a standard information extraction procedure for textual data during the last two decades. NER has been applied to many types of texts and different types of entities: newspapers, fiction, historical records, persons, locations, chemical compounds, protein familie...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کامل